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Abstract Biological information processing networks consist of many com¬ 
ponents, which are coupled by an even larger number of complex multi¬ 
variate interactions. However, analyses of data sets from fields as diverse as 
neuroscience, molecular biology, and behavior have reported that observed 
statistics of states of some biological networks can be approximated well by 
maximum entropy models with only pairwise interactions among the com¬ 
ponents. Based on simulations of random Ising spin networks with p-spin 
(p > 2) interactions, here we argue that this reduction in complexity can 
be thought of as a natural property of densely interacting networks in cer¬ 
tain regimes, and not necessarily as a special property of living systems. By 
connecting our analysis to the theory of random constraint satisfaction prob¬ 
lems, we suggest a reason for why some biological systems may operate in 
this regime. 

Keywords collective dynamics • p-spin models • numerical simulations 


1 Introduction 

The increased throughput of biological experiments now allows joint mea¬ 
surements of activities of many basic components underlying collective in¬ 
formation processing in biological systems. Such multivariate data must be 
interpreted within models. Within this context, Maximum Entropy (MaxEnt) 
models [Tj have been some of the most successful. The logic of such models 
is that, ultimately, one wants to find an approximation Q(x) to the joint 
probability distribution P(x) of the observed multivariate data {xi} = x, 
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i = 1 Unfortunately, for a large number of components, N, the 

datasets can never be large enough to estimate P directly from data. One 
may only be able to estimate various expectation values of functions of the 
data, (/ K (x))p = f K , k = 1,..., K. Then one can search for Q that matches 
the reliable estimates. If additionally one requests that Q has no structure 
beyond that required by the matching, then this is equivalent to asking for 
Q with the maximum entropy, subject to the constraints imposed by the 
matching, 

Q = argmaxS'(Q) - ^ A k ((/ k )q - /«), (1) 

K, 

where the entropy S is defined as 

S(Q) = S(x) = -^T Q(x) log 2 Q(x). (2) 

X 

A common special case of this general formulation is when the variables 
are binary, which we will denote as ccj = <7, € {—1,1}, and the data constrain 
their various low-order correlation functions, such as (cq) or (cr^oy). In this 
case, the MaxEnt approximation Q is [2]: 

Q(er) ^ ex P I ^ hiOi - ^2 JijViVj - ^2 AyfeCqOjO-fc- I . (3) 

y i ij ijk J 

Here every constrained correlation function gets a term in the exponent, Z 
is the partition function, and the Lagrange multipliers hi, Jij,Kijk, ■ ■ ■ must 
be chosen to satisfy the constraints. This is generally not an analytically 
solvable problem, and even numerics are hard [BlSIOiniEUHIin] ■ 

Equation ([3]) has the form of the Ising spin problem, allowing a wholesale 
import of intuition from statistical physics to MaxEnt data analysis. Corre¬ 
spondingly, these ideas have been applied to many biological systems in the 
last decade m , starting with neurophysiological recordings from salamander 
retina m- There N was a few dozen neurons, and a, = ±1 corresponded 
to the i’th neuron spiking/not spiking at a certain time. A surprising result 
was that truncating Eq. (I3J) at the quadratic order in erj (or, in other words, 
constraining <5 up to pairwise correlations) provided a good fit to P. We will 
refer to this finding as pairwise sufficiency from now on. 

The pairwise sufficiency was later found in other neural systems mmm 
(though it is violated at larger N [IS]). It was observed further for natural 
images UBI; for discrete, yet non-binary x in sequencing data ffzmsi; and 
for real-valued velocities of birds in flocking experiments |19j . Even for some 
non-MaxEnt approaches, similar findings were also reported IMS]. One can 
interprete these observations in the context of biological systems operating 
in a special regime mm- However, the wide applicability of the findings 
suggests an alternative: pairwise sufficiency may emerge for a wide class of 
biological and non-biological networks genetically. Indeed, sparse sampling 
of variables in experiments is similar to decimation in statistical physics, and 
the resulting renormalization group-like flow may decrease the importance 
of the higher order couplings I24j . Further, in a perturbative regime, where 
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fluctuations away from the independence are small, the pairwise sufficiency 
also appears I 25 | . Here we propose one more possibility, arguing that the 
pairwise sufficiency arises naturally in strongly coupled multivariate systems. 

In what follows, we first introduce the idea in an intuitive toy model, and 
then develop it numerically by analyzing randomly generated networks. We 
show that the pairwise MaxEnt models approximate such random networks 
surprisingly well. Further, we explore distributions of states of these networks 
and their models, leading to an explanation of the pairwise sufficiency. Fi¬ 
nally, we discuss why diverse biological systems may find themselves in the 
pairwise-sufficient regime, but we live it for the future to investigate if this 
mechanism is responsible for the sufficiency in experimental networks. 


2 Results 

2.1 Building Intuition: Networks of XORs 

For a tractable example of emergence of the pairwise sufficiency, we focus 
on Boolean gates. These are the limit of Ising spin networks in the low tem¬ 
perature (strong coupling) regime [ 2 ]l 26 |. For example, 03 = 01OR02 can 
be written as P(03|02, ay) = y exp[J(0i03 + 0203 + 03)] with J — > 00. If 
also P(ai = ±1) = P(a 2 ± 1 ) = 1/2, then 1/4P(0 3 |0 2 , 01) = P(< 7 i, 02,03). 
Thus the joint probability distribution for OR has the pairwise MaxEnt form, 
Eq. ([3]). Similarly, for J —► 00, 0-3 = oq AND a 2 is equivalent to P(oi, 02,03) = 
y exp[J(0-i03 + 0203 — 03)]. This is again a pairwise MaxEnt distribution. 
However, 0-3 = <j\ X 0 Ro- 2 , is equivalent to P(o’i, 02,03) = \ exp(—/F ct 1 0- 2 0’ 3 ), 
K —> 00. This is an example of a purely third-order gate, with no pairwise 
contributions to its MaxEnt representation. 

In Fig [T] we now couple of a few such third-order gates to each other. 
The spins 01,02, 03 are connected by an XOR (left column, first row), and 
there is no simpler effective representation of the network (right column, 
first row). We then add the fourth spin, 04 = a 2 XOR03 (left column, second 
row). However, then 04 = 01. This can be represented as an effective model 
P(o - 4|o'i,..., 03) = P(o’4|o'i) = ^ exp(Joqoq), J —> 00. Thus the third order 
XOR interaction is equivalent to a pairwise EQUAL interaction (right column, 
second row). The latter is effective and nonlocal, in the sense that 04 is 
coupled to 04, with which it does not interact in the true model. We fur¬ 
ther add 05 = 0 2 XOR04 (third row), and this is equivalent to an effective 
model 05 = 03. In short, of the three third order interactions, each con¬ 
straining one spin and hence “carrying” 1 bit of information, two can be 
represented without any error as pairwise interactions. Now the network can 
exist in four distinct global states out of 2 5 = 32 , determined by 0 12 = ±1 
(namely, 0 (1) = {-1,-1, - 1 , -1,-1}, er (2) = {-1,+1,+1, - 1 , +1}, <x (3) = 
{+ 1 , — 1 , + 1 , + 1 , + 1 }, and cr (4 ) = (+ 1 , + 1 , — 1 , + 1 ,- 1 }). Thus it is far from 
the perturbative regime of Ref. [ 25 ]. We can grow the network further so that 
each new spin is coupled by a third order interaction to two existing spins. 
Then the number of spins, N, and the number of interactions, M, are related 
as N = M + 2 , and all but one third order interaction can be represented as 



4 


True model 


Effective model 



Fig. 1 Emergence of pairwise interactions in a network of XOR gates. On 

the left, we show small networks of spins 04 (grey circles). The spins interact (yellow 
squares) by means of third order XOR interactions. On the right, an equivalent 
network is shown, where some of the XORs get replaced by EQUAL and assignment 
operations, which are the second and the first order interactions, respectively. 


a second order one. In other words, an effective pairwise model has an error 
of only l/(iV — 2 ) when accounting for the statistics of the network states. 

Alternatively, we can add more XORs without adding new nodes. This 
may be inconsistent or redundant with already existing couplings. Or in a 
case such as o\ = a 2 XOR a4 (fourth row), this sets a2 = — 1 (thus adding the 
bias, or the first order term), and all other spins are equal to each other, so 
that the pairwise effective model is exact. Finally, adding 173 = <74 XOR <75 sets 
every spin to -1, and makes even the first-order model exact (bottom row). 

We see that a network of XORs can exhibit the pairwise sufficiency non- 
perturbatively. Of course, more realistic physical or biological systems are 
stochastic (J,K < 00), and such simple arguments will not work. How¬ 
ever, the example suggests that effective pairwise models can approximate 
more complex networks well when nodes in the network interact strongly 
and densely, and the space of network states is sufficiently constrained. In 
such cases, there are many pairs of nodes that are relatively strongly cor¬ 
related simply by chance, allowing replacement of higher order interactions 
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with pairwise ones. In what follows, we will develop this intuition further 
using numerical studies. 


2.2 Pairwise approximations to random networks with higher order 
interactions 


To verify our intuition, we proceed by generating random networks that 
have only higher order nondeterministic interactions among spins (p-spin 
models, p > 2 [ 27 ]). We then quantify the accuracy with which lower order 
MaxEnt models approximate these networks. We explore networks with p = 
3,4 to ensure that our findings do not depend on the exact structure of 
the true higher order interactions. Further, systems with only fourth order 
couplings have the Z2 symmetry, and thus cannot include any first order 
terms in their MaxEnt approximations, Eq. (| 3 | . Studying them will allow us 
to understand if the eventual freezing to a single well-defined state, as in the 
last row of Fig. [l] is crucial for the pairwise sufficiency, or if it emerges even 
for nonperturbative networks with more than one highly probable state. 

To generate the random networks, we first specify N, the number of nodes, 
and M the number of interactions. Then for each interaction p = 1 ,..., M. 
we generate its coupling constant from a zero-mean Gaussian distribution 
with a certain variance s 2 . We then choose three or four spins at random to 
couple. The overall probability of states for these networks is 


P^v) = ^exp 
Pi{&) = ^exp 


M \ 

^ A M cr Ml cr M2 tJ M3 J , 3 -spin model, 

/i=i / 


M= 

M 


^ K u^im a U, 2 c u,3 a M3 a /<4 1 , 4 -spin model, 

Ai=l / 


( 4 ) 

( 5 ) 


where pi < P2 < M3 < /W so that the spins do not self-interact. To specify 
these distributions (and later calculate various errors of approximations), we 
need to know Z. To decouple studying the problem of the pairwise sufficiency 
from a hard problem of efficient sampling, we focus on N < 22 , which allows 
us to estimate Z by direct summation fast enough to do it many times and 
collect statistics. We generate many such distributions P 3 and P 4 , every time 
picking random N £ [ 10 , 22 ], Ad £ [ 1 , 250 ], and s £ [ 0 . 2 , 2 . 0 ]. 

For each generated distribution, we estimate its individual and pairwise 
marginals P(cq), P(ai,aj) for all i,j = by direct marginalization 

(hereafter we drop subscripts 3 or 4 for P if it does not cause confusion). We 
then calculate the first order (or independent) MaxEnt approximation 


N 

= n^^)- ( e ) 

2=1 

Next we fit the pairwise MaxEnt model Q^ to P. While good algorithms 
exist for this purpose [SlSlEltSldlBlO , it is unclear if their assumptions are 
satisfied by our networks. Trying again to decouple the problems of efficient 
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inference and the pairwise sufficiency, we choose a classic, well understood 
Iterative Proportional Fitting Procedure (IPFP) algorithm [ 28 ]. That is, we 
start with Q W as a guess for Q^ 2 \ calculate crj), and redefine 

P) 

Q ( &j) 

We cycle through all pairs i,j, and iterate until Q^ 2 \ai,Uj) ~ P(cq,<7j) up 
to a relative error of 10 -5 . This is achieved within ~ 10 ° ... 10 4 iterations 
depending on how close the final Q ^ is to We verified that starting 

with different initial conditions results in the same solution, as it should. 

To measure the quality of the MaxEnt models, we calculate the Kullback- 
Leibler (KL) divergence between the true distribution P34 and each approx¬ 
imation, normalized by the number of spins in the system: 

/-)(!) 

( 8 ) 

d( 2 ) 

v {2) = ( 9 ) 

Since our maximum N is rather small, this is done by direct summation. 
Notice that both T>^ and V l 2 ) are between zero (perfect fit) and one (the 
worst fit) if single-spin marginals of Q and P are equal. 

In Fig. [2J we plot the values of V W and V^ measured over different 
ensembles of random networks vs. the normalized entropy of the network’s 
state space S = S(cr)/N , which also varies between 0 and 1 . For all types 
of networks and approximation, the quality of fit is high (T>^ is low) when 
S ~ 1 , so that the networks are unconstrained, and nearly all states are 
possible. This is trivial since even the zeroth order approximation (each spin 
up or down with 50 % probability) would work well here. 

As S decreases, the fit errors increase. When S reaches small values, the 
independent approximation, T>^\ starts behaving differently for the different 
network types. In the 4 -spin case, by construction, P(er) = P(—er). Thus 
(cj) = 0 for any i, and the best independent approximation is the uniform 
distribution. For this construction, the smallest possible entropy is S = 1 /N, 
where the network exists in two mirror states, and there the error of is 
= 1 — 1/N. In contrast, a 3 -spin network freezes at S = 0 , and each 
spin is strongly biased (as in our XOR networks above). Thus the independent 
approximation provides a perfect fit in this case. 

The distinction between P 3 and P 4 vanishes for the pairwise MaxEnt 
approximation. Here, for both 3 - and 4 -spin networks, the fit errors behave 
similarly: for S decreasing from 1 , V 1 ' 2 ' 1 grows from 0 and reaches its peak 
at about ss 0 . 25 ... 0.3 near S ~ 0.5 ... 0 . 6 . This is already interesting: 
T>( 2 ' 1 almost never goes above 0.3 for all networks we tried. Thus even in some 
of the worst cases, pairwise approximation is quite good! Further, for even 
smaller entropies, rapidly drops, approaching zero faster than linearly in 
S. For S « 0 . 25 , V^ « 0 . 07 . It is even smaller, V^ ss 0 . 04 , for the quartic 
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Fig. 2 Error of the MaxEnt fits vs. the normalized entropy of the net¬ 
work state space, 5. The left panel shows errors of the independent, T> (1 \ and 
the pairwise, approximations for 3-spin networks. We used ~ 800 random 

networks with 11 < N < 20 spins and with a varying number of interactions, M. 
We partitioned all the networks by their S in bins of width of 0.1 and calculated 
the mean and the standard deviation of V for each bin. These are indicated by tri¬ 
angles and the error bars. Wherever the data points for individual networks showed 
little scatter, we plotted these points instead of the bin averages. The middle panel 
presents similar data, "D (1) and "D (2 \ for 4-spin networks. Here over 4000 random 
networks were generated with 11 < TV < 22. T> (2 ' was again averaged within ten 
bins, and the means and the standard deviations are plotted. For D^\ data for 
individual networks are presented. These merge into a perfect straight line due to 
the Z 2 symmetry of 4-spin distributions. For both the 3- and the 4-spin cases, the 
pairwise sufficiency is clear at low S. The right panel replots the D (2 ' data for 
the 4-spin networks, but splits them according to a, which measures the average 
strength of interactions per spin within a network. Large a curves are significantly 
below their small a counterparts, indicating that, other things being equal, densely 
and strongly interacting networks are more likely to be pairwise sufficient. Notice 
that large a curves end abruptly since such networks cannot have large S. 


case. This is because min S A = 1/N, so that the whole V^ curve is slightly 
shifted compared to V 3 ’ at low S. 

In summary, for all the networks we have considered, pairwise sufficiency 
emerges robustly at low (but not too low) entropy. In fact, at S ~ 0.25, our 
networks can be in more than 2 s = 2 NS « 2 5 = 32 highly probable states. 
Thus the networks are not totally frozen, and yet the pairwise approximation 
is nearly sufficient! Crucially, this finding is robust to the changes in the 
network size: T> vs. S curves are stable over the entire range of N we explored. 

Within a single narrow bin of S , may still have a rather large range. 
We explore this variability in the rightmost panel of Fig. [2} For this, we define 
a = sM/N (recall that s is the standard deviation of the random couplings 
used to generate the networks), a measures the strength of interactions (or 
constraints) per spin, analogously to a similar parameter in the random con¬ 
straint satisfaction problems (23130]. For quartic networks, where we have 
enough samples, we then plot V^ vs. S for different ranges of a. Crucially, 
we find that, for the same 5, a larger a results in better pairwise fits. In 
other words, a denser and stronger interacting network is more likely to be 
pairwise sufficient. This is potentially a good news for MaxEnt approaches 
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Fig. 3 The pairwise sufficiency emerges in the nonperturbative regime. 

We select all highly probable states <r M , defined somewhat arbitrary as P(cr' J ) > 
0.001. We then calculate the overlap for all pairs of such states. Finally, we plot the 
histogram of the magnitudes of the overlaps from all 4-spin networks with N > 20, 
and with 0.1 < S < 0.3. Such networks can exist in many states, but still very 
few compared to 2 ,v . Most of the overlap magnitudes are away from 1, indicating 
small to moderate similarity among the highly probable states. Thus these states 
are broadly dispersed and do not cluster together. 


to biological systems, which are known for the immense complexity of the 
underlying biophysical interactions. 

We conclude this section by stressing that high probability states of pair¬ 
wise sufficient p-spin networks are not close to each other. To illustrate this, 
we focus on the 4-spin case with N > 20, and on small but not negligible 
S. We then evaluate the magnitude of the overlap, \cr^ ■ cr^l /N, among all 
highly probable network states and plot the distribution of the overlaps in 
Fig-d For purely randomly distributed states, we would expect the standard 
deviation of overlaps to be ~ 0.22, and a peak near zero. And we would ex¬ 
pect magnitudes of overlaps near 1 if all highly probable states were clustered 
near a dominant one. Instead, the distribution in Fig. [3] is not concentrated 
near 1, and the standard deviation is ss 0.39. Therefore, there is some clus¬ 
tering of probable states, but certainly not strong clustering. Thus the state 
space of our networks cannot be described as small fluctuations around a 
dominant state, and the pairwise sufficiency here is not perturbative :25j. It 
likely emerges due to a previously not investigated mechanism. 


2.3 The structure of the state space of the pairwise sufficient networks 

The toy example of the XOR network suggests that the pairwise sufficiency 
may emerge when the network “freezes” to a few (but not necessarily just 
one) highly probable states, and different relatively tightly coupled clusters 
of spins decouple from each other. Is this also true for our networks with 
a nonzero temperature? How do energy landscapes of the sufficient and the 
insufficient networks differ from each other? And are the MaxEnt fits for 
both cases structurally different? 

To start exploring this, we estimate hi and Jij for Q ^ inferred using 
IPFP. We do this by choosing N(N + l)/2 states with the highest proba¬ 
bility from Q ( ' 2 \ We get the energy of each such state as E(cr) = — logP(er) 
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eigenvalue rank 


Fig. 4 Spectra of the pairwise coupling matrices Jij for MaxEnt approx¬ 
imations to random 4-spin networks, N > 20. We order eigenvalues from 
the smallest (lowest energy) to the largest (highest energy). We then plot the mean 
spectra (with standard deviations, where it does not obstruct the figures), averaged 
over different subsets of 4000 networks. Low S subset corresponds to 0.1 < S < 0.2. 
Such networks are fit extremely well by pairwise models, with mean Z^ 2 ' « 6 ■ 10 -3 . 
High S corresponds to 0.3 < S < 0.4. Here the pairwise fits are bad, so that the 
mean Zfi 2) « 0.15. Finally, for the intermediate range 0.2 < S < 0.3, the quality 
of fits is diverse. We further partition this range into well fitted, Z) 12 -’ < 0.06, and 
badly fitted, 'D 1 ' 2 ' 1 > 0.12, subsets, leaving intermediate fits off the plot. The four 
average spectra show that the pairwise sufficiency is directly correlated with the 
scale of the spectra, with larger magnitude eigenvalues resulting in smaller T>^ 2 \ 
The inset shows the averages for each of the four ranges, where each spectrum is 
normalized by its largest magnitude negative eigenvalue. The four curves are very 
close for much of their range. 


and then solve the linear regression problem to find the coupling constants 
from the states and their energies. Finally, we calculate the eigenvalue spec¬ 
tra of the inferred Jij, having set Ju = 0. The averaged spectra are shown 
in Fig. [ 4 ] for different combinations of S and T>^ 2 \ We see that the suc¬ 
cess of is correlated with the magnitude of the eigenvalues of Jy - 
larger magnitudes, which correspond to stronger interactions and more con¬ 
strained distributions, give the pairwise sufficiency. This is true irrespective 
of S (though S and a are dependent, as we have discussed). Crucially, if one 
rescales the spectra by their largest magnitude negative eigenvalue (Fig. |4j 
inset), then all spectra collapse. Thus the J,j (or the energy landscapes) of 
the pairwise sufficient and the pairwise insufficient fits are not intrinsically 
different: a rescaling (change in temperature) can morph one into the other. 

Having analyzed the pairwise sufficient and insufficient solutions, we 
now focus on the landscapes of the p-spin networks themselves. The freezing 
that results in the decrease of V^ 2 ' 1 and the growth of the eigenvalues of J 
can create the landscapes of different types. For example, the highly probable 
states may be essentially uncorrelated, reminiscent of the landscapes of the 
Hopfield network in the ferromagnetic phase m- Alternatively, as in our X0R 
network, entire blocks of spins can merge into strongly correlated clusters, 
which then decouple from each other. Then the low energy network states 
will be direct products of the states of the clusters. To disambiguate the 
two scenarios, we calculate pairwise spin-spin correlation Cjj = ^^stda- 
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spin No. spin No. spin No. 


Fig. 5 4-spin networks decouple into spin clusters in the pairwise suffi¬ 
cient regime. The three panels show typical values of |dj for N = 21, sorted into 
correlated clusters. Left panel: a nearly perfectly pairwise sufficient network with 
S = 2.05 bits and XA 2 -* = 8.0 • 10 -5 . In accord with the entropy value, the network 
splits into two clusters, with spins nearly perfectly correlated within each, but al¬ 
most independent across. Middle panel: a good pairwise fit with S = 3.9 bits and 
D (2 ) = 0.032. Correspondingly, four clusters of different sizes are seen. However, 
now the spins also exhibit some correlations across clusters, which presumably leads 
to the increase in fA 21 . Right panel: a network with S = 5.7 bits and X> (2 ^ = 0.16 - 
a bad (though not disastrous) pairwise fit. There are now many small clusters, but 
correlations within and across the clusters are not very different. 


by direct summation of P (note that, for 4-spin networks, the correlation is 
equal to the covariance since (c7,) = 0). We then cluster the spins based on 
the absolute value of their correlations. Figure [5] shows the clusters for 4- 
spin networks (note that since the number, the size, and the spin assignment 
for clusters are different for each network, we only show typical cases). A 
network with a near-zero £b 2 ) (left panel) shows a perfect partitioning into 
two clusters; S(cr) « 2 bits is a result of this partitioning. As networks with 
larger entropies are considered, the number of clusters increases, and their 
boundaries become fuzzy, leading to worse MaxEnt fits. When the definite 
cluster structure disappears, V^ grows dramatically. Thus the existence of 
well-defined spin clusters is correlated with the pairwise sufficiency. 

For 3-spin networks, in addition to the pairwise interactions, there are 
also nonzero single spin biases in the MaxEnt fits. Thus the entropy and 
correlations among spins are generally smaller for the same . Nonetheless, 
as seen in Fig. |bj the (fuzzy) cluster structure for these networks is not that 
much different from the 4-spin case. 

To further explore the network landscapes, we point out that an inferred 
symmetric Jy can be rewritten as 

N 

^ = a°) 

V=l 

where and are the eigenvalues and the eigenvectors, correspondingly. 
The eigenvalues take both large positive and large negative values for the 
pairwise sufficient networks (cf. Fig. [4]). The negatives correspond to wells 
in the landscape, and the positives correspond to peaks. If the wells and the 
peaks were clearly separated, then spin configurations in the vicinity of the 
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Fig. 6 A 3-spin network also decouples into spin clusters in the pairwise 
sufficient regime. For brevity, here we show \dj\ only for one network similar to 
the middle panel in Fig. [ 5 ] S ss 3.5, ss 0.58, and T>^ w 0.038. Four to seven 
partially overlapping clusters can be seen. 


well center, distinct from it by just a single spin flip, would have similar high 
probabilities (this is what allows an eigenvector to act as a broad attractor in 
the Hopfield network ED)- The repulsive peaks far away from the wells would 
have little effect on V < ' 2 ' > since the probability of states away from the wells 
is small in the low temperature regime even without the peaks. In contrast, 
were spins to form tight clusters, flipping a single spin would not be allowed. 
Peaks would be needed to decrease the probability of such cluster-breaking 
states, and thus positive eigenvalues would affect V^ strongly. 

To verify which of the two scenarios holds, for 4-spin networks, we con¬ 
struct the coupling matrix and the pairwise MaxEnt distribution from only 
n < N eigenvalues, 


V=\ 


Q {2) (n) = exp ■ 


( 11 ) 

( 12 ) 


We then evaluate V^ between P and Q^ 2 \n) as a function of n. Figure [ 7 ] 
shows this dependence for a typical pairwise-sufficient distribution and for 
two different ways of including eigenvalues into J. In the first, we proceed 
from the most negative eigenvalue to the most positive one. In the second, 
we proceed from the largest magnitude eigenvalue to the smallest one. Since 
sorting by magnitude (which includes large positive eigenvalues earlier) ap¬ 
proaches the terminal V^ faster, wells and peaks must both affect close spin 
configurations. This is again consistent with the clustering picture. 


2.4 The mechanism of emergence of the pairwise sufficiency 

The clustered structure of the network landscapes allows us to propose a 
hypothesis for why densely coupled p-spin networks exhibit the pairwise 
sufficiency. We re-group terms in the energies, which define P 3 and P 4 in 
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Fig. 7 Positive eigenvalues of the MaxEnt coupling matrix Jij contribute 
to the approximation error. We plot the fit error, lF 2) , as a function of the 
number of the eigenvalues of Jij included in the fit for a 4-spin distribution with 
N = 21, S « 0.1, and « 0.014. The blue line includes eigenvalues in the order 
from the most negative to the most positive, and the red one includes them in 
the order of their absolute values. The red line reaches the limiting value of T > l2 - ) 
quicker, while the blue one requires inclusion of all eigenvalues for this to happen. 
As explained in the text, this is a signature of emergence of spin clusters. 

Eqs. (El [5]). For example, all terms that couple at and (J :) for the 4-spin net¬ 
work canbe rewritten as 



(13) 


where S. t . is a Kronecker delta. (Here we slightly abused the notation and 
imposed that i,j only occur in /j-i and /t 2 -) This equation defines a ran¬ 
dom coupling Jjj, which depends both on the current network state and on 
the quenched randomness that went into building the network. For a large 
number of couplings, fluctuations in Jj will be large enough so that, aver¬ 
aged over the accessible network states, Jij stays far from zero compared 
to its standard deviation cr j i:j . This creates large effective pairwise coupling 
among spins, so that clusters of spins start behaving coherently. Then the 
state of every spin in the cluster can be defined by choosing a cluster rep¬ 
resentative, setting its value, and then coupling each cluster member to the 
representative through a pairwise interaction higher order couplings are 
not needed! The pairwise MaxEnt fit is nearly exact, even though the net¬ 
work is far from frozen since values of the cluster representatives are not 
necessarily constrained. We illustrate this in Fig. [8j which shows that higher 
order couplings average to produce large effective pairwise interactions for 
correlated spins. 

3 Discussion 


In this numerical study, we showed that pairwise MaxEnt models are more 
effective in approximating random p-spin networks (p = 3,4) than one would 
naively expect. Even in the worst cases, the error of such models was rarely 
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Fig. 8 Formation of effective couplings in 4-spin networks. For the same 
network as in Fig.l5] middle, we plot (Jij) (left) and (J7) j) in units of the standard 
deviation (right) for each pair of spins vs. dj. For correlated spins, higher order 
coupling terms add up, on average, to strong pairwise couplings. Negative J7i/s 
typically correspond to positive correlations (and vice versa), as expected. 


above V^ ~ 0.3, and it was much lower for densely coupled networks, with 
lower entropy per spin. We traced the emerging pairwise sufficiency to for¬ 
mation of coherent clusters of spins, largely decoupled among themselves, 
resulting in a multitude of dependent attractors for the system. This is not a 
perturbative effect and is a new proposal for explaining pairwise sufficiency. 
Such collective behavior introduces substantial redundancy, and would allow 
error correction. However, this error correction is of a very different nature 
compared to, for example, the Hopheld network (31] . 

Does the mechanism presented here explain the pairwise sufficiency in 
any real biological system? This is unclear since our analysis was limited 
to specific simulated networks, which may or may not be good models of 
real biology. Specifically, the network in the original paper that observed 
the pairwise sufficiency m had much smaller entropy per spin (neurons 
rarely fired), and correlations among spins rarely exceed 0.2. In contrast, 
while 3-spin networks in our simulations had smaller S and smaller spin- 
spin correlations than their 4-spin counterparts, these numbers were still 
larger than those in the experiments. At the same time, pairwise MaxEnt 
models do not fit experimental data perfectly (certainly worse than some 
of our nearly perfect fits) [15] ■ It may be that some structural features of 
real systems allow them to operate at higher V^ for smaller S compared 
to the simple models we investigated here — and exploring a wider class of 
networks for signatures of behaviors that we observed would be the next step. 
This is especially important since large coherent deviations from the most 
probable state into 10... 100 metastable states seems to be a crucial feature 
of many experimental systems (such as bursts of neural activity in the retina 
that predominantly stays quiet [15]). Such metastable states far away from 
the ground state at least resemble the models that we studied. In addition, 
MaxEnt models in other fields may have very different properties compared 
to those in neuroscience, including different typical entropies and correlation 
strengths. Therefore, we hope that our models and their generalizations will 







14 


be able inform interpretation of experimental data, even if they do not match 
the experiments in some important properties. 

With (approximate) pairwise sufficiency seen in many collective biological 
phenomena, it is important to ask why these systems operate in the regime 
that allows it to hold. Indeed, within our model, the pairwise sufficiency 
is not generic: low V 1 ' 2 ' 1 happens only for small S, and preferentially when 
the strength of the interactions is high, a = Ms/N 1 (cf. Fig. [2|. The 
need for redundancy and error correction is a potential explanation - but 
there is no obvious reason why the redundancy must result in the pairwise 
sufficiency (indeed, simple parity-based codes probably do not). Taking the 
improvement in 'D ^ 2 ' 1 with the increase in a seriously, we propose a different 
explanation (a similar argument was first suggested in Ref. [52]). 

One can view evolution as trying to satisfy a growing list of constraints 
imposed upon a biological network by its interactions with the environment. 
These constraints can include efficient information processing, low energy 
consumption, robustness to perturbations, fitting within a certain physical 
size, responding quickly enough so that actions are relevant in the changing 
world, etc. Some of these global constraints may be equivalent to a large 
number of local constraints. For example, efficient information transmission 
in the visual system typically includes removal of redundancy present in the 
natural stimuli |33| . which is equivalent to a multitude of constraints on ac¬ 
tivities of nearby neurons. When contraints are added, fewer and fewer states 
of the network remain accessible. Importantly, at least for certain abstract 
constraint satisfaction problems [30 | l34 j . before there are no more states left, 
the accessible states organize themselves in a handful of small, well-separated 
groups. Whether these states are uncorrelated, or consist of collective flipping 
of clusters of spins, they can be well represented by pairwise MaxEnt mod¬ 
els. (In the former case, such MaxEnt model would have a Hopfield network 
structure E2; in the latter, pairwise interactions would determine cluster 
assignments.) Therefore, it can be that the pairwise sufficiency is a signature 
of a biological network nearing the unsatisfiability threshold, being pushed 
towards it by evolution. Exploring landscapes of satisfiability problems with 
more realistic ensembles of constraints (or interactions) and comparing them 
to the landscapes observed in experiments would address this hypothesis. 
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